June 2026 | Fortaleza AI Blog

Prompt injection attacks have become one of the leading security threats facing enterprise AI deployments. Researchers have demonstrated that a single carefully crafted sentence can cause an AI assistant to ignore its safety instructions, leak confidential data, or execute unauthorized actions. In healthcare, the stakes are uniquely high: an attacker could manipulate a clinical AI to provide dangerous medical guidance or exfiltrate patient records through seemingly innocent conversation. Yet most AI platforms rely on a single layer of protection — usually just the LLM’s own instruction-following — which has been proven insufficient against adversarial techniques.

Fortaleza AI addresses this threat with a multi-layered security pipeline that scans every prompt and every response through multiple independent security engines before anything reaches the AI model. Layer 1 is LLM Guard, an ML-based scanning system that runs seven input scanners in sequence, ordered from cheapest to most expensive, which catches known jailbreak phrases through fast string matching, detecting accidentally pasted API keys and passwords, TokenLimit to prevent resource exhaustion, prompt injection which uses a DeBERTa-v3 ONNX classifier to detect adversarial inputs, Anonymize to mask PII using Presidio NER models, Toxicity filters against harmful language, and BanTopics blocks prohibited subject matter through zero-shot classification.

Layer 2 is NVIDIA’s NeMo Guardrails, which provides semantic and dialog-flow analysis that catches sophisticated attacks the pattern-matching layer would miss. NeMo uses dialog flows to enforce conversation boundaries, an LLM self-check that asks the model to evaluate whether an input is attempting manipulation, and Presidio-based PII detection on outputs. The two layers provide overlapping coverage: LLM Guard catches PII, secrets, and known attack patterns at machine speed, while NeMo catches jailbreak attempts via rephrasing, contextual safety violations, and multi-turn manipulation patterns that require understanding conversational context.

One of the most insidious problems we discovered and solved is what we call “history poisoning.” When a borderline prompt triggers a security warning but isn’t blocked outright, it gets saved to conversation memory. On the next turn, the security scanner sees the flagged content in the conversation history and scores it even higher, creating a cascading cycle that eventually blocks all legitimate requests. Our three-tier verdict system prevents this by tagging the conversations to be used later or set aside to not poison the rest of the conversations. Prompts that are on the edge are processed normally but excluded from conversation memory, breaking the cascade while maintaining security visibility through audit logs.

Healthcare organizations cannot afford to deploy AI without robust, multi-layered security. A single prompt injection in a clinical setting could compromise patient safety or trigger a HIPAA investigation. Fortaleza AI’s multi-layered pipeline, combined with our history poisoning prevention, provides the defense-in-depth that regulated industries require. The entire security stack runs on-premise with pre-downloaded models and requires no internet connectivity. If your organization needs AI that’s secured like the sensitive data it handles, visit www.fortalezaai.com to schedule a technical demonstration.

To learn more or schedule a demo, visit fortalezaai.com or contact our team at contact@fortalezaai.com.

Stopping Prompt Injection Before It Starts